AITopics | barzilai-borwein step size

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Neural Information Processing SystemsNov-21-2025, 15:23:04 GMT

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

barzilai-borwein step size, name change, stochastic gradient descent, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian

Neural Information Processing SystemsNov-21-2025, 09:51:38 GMT

SGD that can reduce the variance and improve the complexity.

artificial intelligence, machine learning, step size, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Reviews: Barzilai-Borwein Step Size for Stochastic Gradient Descent

Neural Information Processing SystemsJan-20-2025, 19:59:08 GMT

It allows using "Option I" (taking the final iterate of the inner iteration), as is done in practice. They also propose to use a scaled version of Barzilai-Borwein to set the step-sizse for SVRG (and heuristically argue that this could also be useful for classic stochastic gradient methods too). Their experiments show that this adaptive step-size is competitive with fixed step-sizes. Clarity: The paper is very clearly-written and easy to understand (though many grammar issues remain). Significance: Although several heuristic adaptive step-size strategies exist in the literature, this is the first theoretically-justified method. It sill depends on constants that we don't know in general, but I believe is a step towards black-box SG methods. Details: Independent of the SVRG/SG results, the authors give a nice way to bound the step-size for the BB method. Normally, BB leads to a much faster rate than using a constant step-size, but in the SVRG setting your theory/experiments are just showing that it does as well as the best step-size (which is good, but it isn't better than the best step size). Finally, the paper would be much stronger if it compared to the two existing strategies that are used in practice: 1.

barzilai-borwein step size, review, stochastic gradient descent, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.73)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Neural Information Processing SystemsMar-12-2024, 17:29:49 GMT

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

gradient, step size, svrg-bb, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Stochastic Steffensen method

Zhao, Minda, Lai, Zehua, Lim, Lek-Heng

arXiv.org Artificial IntelligenceNov-28-2022

Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.

artificial intelligence, machine learning, steffensen method, (16 more...)

arXiv.org Artificial Intelligence

2211.1531

Country:

Asia > Japan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > Suffolk County > Melville (0.04)
(3 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu

Neural Information Processing SystemsFeb-14-2020, 06:26:34 GMT

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature.

barzilai-borwein step size, sgd-bb and svrg-bb, stochastic gradient descent, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu

Neural Information Processing SystemsDec-31-2016

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

gradient, step size, svrg-bb, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu

arXiv.org Machine LearningMay-22-2016

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result is missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

artificial intelligence, machine learning, step size, (15 more...)

arXiv.org Machine Learning

1605.04131

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology: